Chronic Heart failure has an average of less than one-year of survival rate after diagnosis, and thus it is essential for predicting the life span based on patient characteristics. While some studies show, patients might have, on average, 5 years post-diagnosis [Final Stages of Heart Failure: End-Stage Heart Failure, 2020], which could also be crucial to increase patient interactions and move them to palliative and supportive care, nurturing them to increase their quality of life. The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a publicly available resource for intensive care research using which we tried to include factors such as age groups and SAPS scores to correlate mortality rates with factors associated with chronic heart failure (CHF).
Dataset The dataset is derived from MIMIC-II, the publicly accessible critical care database. It contains a summary of clinical data and outcomes for 1,776 patients. The dataset (full_cohort_data.csv) is a comma-separated value file that includes a header with descriptive variable names.
To Access the dataset Clinical data from the MIMIC-II database for a case study on indwelling arterial catheters. https://physionet.org/content/mimic2-iaccd/1.0/
Primary Usage of the Dataset The primary use of this dataset is to carry out the case study in Chapter 16 of Secondary Analysis of Electronic Health Records. The case study data walks the reader through the process of examining the effect of indwelling arterial catheters (IAC) on 28-day mortality in the intensive care unit (ICU) in patients who were mechanically ventilated during the first day of ICU admission.
Strengths and Weaknesses of the Dataset The dataset is of MIMIC; thus, its trustability of it is assured since MIMIC is a reputed data source for medical data. It is also an open-source data set and can be accessed by everyone. It is an extensive dataset spanning numerous attributes like physiological parameters, body constituents, disease presence, and so on. The data dictionary is self-explanatory. Most importantly, the data does not contain any missing values or parameters in it; the completeness of the dataset is a major advantage for any analysis. Despite the cleanliness and completeness of the dataset, the fact that there are only 1776 instances is less for in-depth detailed analysis and model building. If there were more instances or patients recorded as part of the dataset, the subsequent study and its finding would be more inclusive and meaningful, which can stand true in numerous cases.
SAPS Scores SAPS III Admission Scores categorize patients based on their risk level to the worst prognosis. The scoring criteria have three categories based on patient characteristics before ICU Admission, circumstances of ICU Admission, and presence and degree of physiological derangement at ICU Admission. Based on these criteria, patients are categorized and scored to provide a prompt and optimized way of care to alleviate patient factors and satisfaction rates.
Population MIMIC data, population with SAPS score between 5-15 SAPS score is the risk of mortality of the patient in the ICU based on the severity of the disease condition.
Intervention or Exposure Variable Congestive heart failure (chf_flg) is a binary variable where 0 indicates the negative outcome and 1 indicates the positive outcome.
Comparison We aim to compare patients with congestive heart failure and without congestive heart failure. Congestive heart failure and chronic renal disease had a correlation of 0.2475 with mortality (relatively higher than the other variables in the dataset), which led us to choose congestive heart failure and chronic renal disease as the exposure variable and confounder, respectively.
Outcome Variable The outcome variable is censored or death (censor_flg) which is a binary variable indicative of death when equal to 0 and indicative of censored when equal to 1. Also, because the SAPS score is an indication of mortality, hence it was more relevant to choose mortality as an outcome variable.
Confounder(s) There are various confounders in the dataset, such as categorical variable, chronic renal disease (renal_flg) was chosen. It is medically observed that having chronic kidney disease (CKD) implies a greater chance of having heart disease [American Kidney Fund. (2022, February 15)]. CKD can cause heart disease, and heart disease can cause CKD. In fact, heart disease is the most common cause of death among people on dialysis. Renal disease as a confounder can affect or impact both the exposure variable of heart disease and the outcome variable of mortality. Other confounds include continuous variable hemoglobin count (hgb_first), which is taken at the time of admission of a patient to the ICU. Reduced hemoglobin in patients with congestive heart failure (CHF) has been shown to be independently associated with an increased risk of hospitalization and all-cause mortality. Findings suggest a linear association between reduced hemoglobin and increased mortality risk. In studies that analyzed hemoglobin as a continuous variable, a 1-g/dL decrease in hemoglobin was independently associated with significantly increased mortality risk [Tang, Y. D., & Katz, S. D. (2006)].
After initial data importing and setup. We sorted the data, cleaned
it, and checked for missingness using the naniar package.
We performed exploratory data analysis on various variables( both
categorical and continuous) to hypothesize the question. Later we used
Clustering to find the most frequent number of clusters, PCA to check
the dimensionality of the dataset, and feature selection using the
boruta package before moving to the model selection and
evaluating performances.
Question of Interest To find the mortality rate for a population with SAPS scores between 5-15, according to the age group, on patients with and without heart disease who are admitted to the ICU.
## [1] 0
Interpretation & Analysis: Based on our preliminary data exploration and visualization, we found that higher WBC counts, which correlate with immunity in the literature, relate to fewer ICU stay days. Most patients with chronic diseases admitted to the ICU who had an initial WBC count between 0 and 30 on the first day of ICU admission stayed in the ICU for a greater number of days as compared to those who had an initial WBC count greater than 30 on the first day of ICU admission.
## age gender_num sapsi_first chf_flg censor_flg
## age 1.00000000 -0.13808683 0.217115435 0.285634843 -0.41506929
## gender_num -0.13808683 1.00000000 -0.067136553 -0.067469341 0.03413822
## sapsi_first 0.21711543 -0.06713655 1.000000000 0.061033686 -0.16957742
## chf_flg 0.28563484 -0.06746934 0.061033686 1.000000000 -0.16028429
## censor_flg -0.41506929 0.03413822 -0.169577419 -0.160284289 1.00000000
## renal_flg 0.14029191 0.05972044 0.100854089 0.257320151 -0.04454071
## wbc_first -0.13362334 0.04053159 0.016360147 -0.064126509 0.01257482
## hgb_first -0.09391657 0.11932993 -0.157179505 -0.110309782 0.08484022
## icu_los_day 0.01159406 0.04297427 0.066149587 0.088013720 -0.05456521
## hospital_los_day -0.07099685 0.05962433 0.001499691 -0.002386556 0.09456228
## renal_flg wbc_first hgb_first icu_los_day
## age 0.14029191 -0.133623339 -0.093916573 0.011594062
## gender_num 0.05972044 0.040531586 0.119329935 0.042974266
## sapsi_first 0.10085409 0.016360147 -0.157179505 0.066149587
## chf_flg 0.25732015 -0.064126509 -0.110309782 0.088013720
## censor_flg -0.04454071 0.012574818 0.084840225 -0.054565214
## renal_flg 1.00000000 -0.105679041 -0.077274127 -0.025475375
## wbc_first -0.10567904 1.000000000 0.031932388 0.009156714
## hgb_first -0.07727413 0.031932388 1.000000000 0.046753674
## icu_los_day -0.02547538 0.009156714 0.046753674 1.000000000
## hospital_los_day -0.02339171 -0.035493384 -0.005714846 0.565693453
## hospital_los_day
## age -0.070996847
## gender_num 0.059624334
## sapsi_first 0.001499691
## chf_flg -0.002386556
## censor_flg 0.094562283
## renal_flg -0.023391715
## wbc_first -0.035493384
## hgb_first -0.005714846
## icu_los_day 0.565693453
## hospital_los_day 1.000000000
Interpretation & Analysis: We plotted this correlational plot to find the variable that has highest correlation with the number of days spent in ICU. We found that the number of days spent in ICU and number of days in the hospital are highly correlated variables.
Interpretation & Analysis: The correlational plot gave us insights into the high association between the number of days spent in the ICU and the number of days in the hospital. This helped us to plot a 3D density plot and find the age factor and mortality associated with the length of stay in the ICU and hospital. The green dots are “alive” patients, and the blue dots are the “dead” patients. Patients with chronic diseases and ages greater than 80 tend to have a lower length of stay in the hospital and ICU and are more likely to die. Patients within the age group of 20 to 50 who are admitted to the ICU have varying lengths of stay in the ICU and hospital, with the likelihood of being alive after ICU admission.
Interpretation & Analysis: On dividing the dataset into age groups below and above age 60 and with and without heart disease, we found that mortality rates are higher in age group above 60 in general and especially in with heart disease.
## K-means clustering with 3 clusters of sizes 113, 145, 498
##
## Cluster means:
## gender_num sapsi_first chf_flg censor_flg renal_flg wbc_first hgb_first
## 1 0.6017699 17.76106 0.1504425 0.6725664 0.05309735 11.682035 12.25841
## 2 0.5241379 18.19310 0.1241379 0.6068966 0.02068966 22.245517 11.92897
## 3 0.5060241 17.69277 0.1847390 0.5702811 0.06224900 9.735984 11.95060
## icu_los_day hospital_los_day
## 1 9.340708 23.823009
## 2 3.718828 7.682759
## 3 3.140502 6.433735
##
## Clustering vector:
## [1] 3 2 3 3 1 1 3 2 2 3 1 3 2 3 1 3 3 3 3 3 3 3 3 3 3 3 3 2 3 1 1 3 3 3 3 3 3
## [38] 1 1 1 2 3 3 3 3 1 1 3 3 2 3 1 3 3 2 3 3 1 2 3 3 3 3 2 3 3 2 3 3 2 3 1 3 3
## [75] 3 3 3 3 1 3 3 3 3 3 2 3 3 2 1 3 3 2 3 3 3 1 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3
## [112] 3 2 3 3 2 3 3 3 3 1 1 3 3 3 3 2 2 2 2 3 3 1 2 3 3 3 3 3 1 1 3 3 2 3 3 3 1
## [149] 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 2 3 1 3 3 3 2 3 3 3 3 3 3 2 3 3 3 3 3 3
## [186] 2 3 3 1 3 3 3 3 3 1 3 2 3 3 1 1 3 3 3 3 3 2 3 2 3 1 3 3 3 3 3 1 1 1 3 2 1
## [223] 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 2 3 3 1 2 2 3 3 3 3 1 3 3 3 3 1 2
## [260] 3 3 3 3 3 3 1 2 1 3 1 2 2 3 3 1 3 2 2 3 2 3 1 2 2 3 2 3 3 2 2 3 2 1 2 3 3
## [297] 3 2 1 3 3 3 1 3 3 3 2 3 2 2 3 3 3 3 3 3 2 3 3 2 2 3 3 3 3 3 1 2 2 2 3 3 3
## [334] 3 3 3 3 3 2 3 3 3 3 2 3 2 1 3 3 2 3 3 3 3 2 1 3 3 3 3 2 3 3 3 3 3 3 1 3 3
## [371] 2 3 3 2 1 1 2 1 3 3 3 3 1 3 2 1 2 3 2 3 3 2 3 3 3 1 3 1 3 2 2 3 3 3 2 2 3
## [408] 3 3 2 1 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 1 3 2 3 2 3 3 3 1 1 2 3
## [445] 1 3 1 3 3 3 3 3 3 2 3 3 3 3 2 1 2 1 3 1 3 3 3 3 2 1 1 2 2 3 3 3 3 3 3 3 2
## [482] 2 3 1 3 3 2 3 3 3 3 1 2 3 3 1 2 3 2 3 3 3 3 2 3 3 3 3 3 3 1 1 1 1 3 3 1 1
## [519] 2 3 3 1 3 1 2 3 2 3 1 3 3 2 3 1 3 3 3 3 3 3 1 1 3 2 1 3 3 2 2 3 3 2 1 1 1
## [556] 3 3 3 3 3 3 3 3 3 3 3 3 1 3 2 3 2 3 1 3 3 3 3 3 3 2 3 1 3 1 3 3 3 2 3 3 3
## [593] 3 2 3 3 3 3 1 2 3 1 2 3 3 3 3 2 3 2 3 3 3 3 3 2 3 3 3 1 3 3 3 3 3 3 2 2 3
## [630] 3 3 3 3 2 3 1 3 3 1 2 3 3 3 2 2 3 1 2 3 1 3 3 3 1 3 3 3 3 1 1 2 3 3 3 3 3
## [667] 3 2 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 2 3 2 1 3 3 1 2 3 3 1 2 3 3 3 3 3 1 1
## [704] 3 3 3 3 3 3 3 3 3 3 3 3 2 2 3 3 2 2 3 3 2 3 3 3 1 3 2 3 2 2 1 1 3 1 2 3 1
## [741] 3 3 3 3 3 2 3 3 3 3 3 3 2 1 3 1
##
## Within cluster sum of squares by cluster:
## [1] 23957.65 20561.73 20291.51
## (between_SS / total_SS = 43.3 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
Interpretation & Analysis: The bar plot of k-means clustering algorithm displays how much of the dtaa is represented using 1,2, or 3 clusters. It appears that 2 or 3 clusters represent the data quite adequately. In addition, the silhouette plot gives us the optimal number of clusters representing the data which is 2 clusters.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## Standard deviation 1.378 1.0687 1.0545 1.0031 0.9767 0.9501 0.90159 0.80364
## Proportion of Variance 0.211 0.1269 0.1235 0.1118 0.1060 0.1003 0.09032 0.07176
## Cumulative Proportion 0.211 0.3379 0.4615 0.5733 0.6793 0.7796 0.86988 0.94165
## PC9
## Standard deviation 0.72470
## Proportion of Variance 0.05835
## Cumulative Proportion 1.00000
Interpretation & Analysis: The score plots indicate the projection of the data onto the span of the principal components. Scores further out are either outliers or naturally extreme observations. In addition, most of the data points have first PC score of 0 whereas few data points have first PC score as -2.
## Boruta performed 99 iterations in 31.98341 secs.
## 8 attributes confirmed important: age, agegroup, censor_flg,
## hgb_first, icu_los_day and 3 more;
## 1 attributes confirmed unimportant: gender_num;
## 3 tentative attributes left: chf_flg, heart_failure, hospital_los_day;
Using tentative and confirmed important attributes: we select sapsi_first, heart_failure, mortality
## age gender_num chf_flg censor_flg renal_flg wbc_first hgb_first
## [1,] 12.12949 -0.9084812 1.762399 6.906130 4.389413 4.161883 7.152791
## [2,] 13.14845 2.2868219 2.122811 8.523155 4.142409 3.319650 7.362015
## [3,] 11.18096 0.6705264 3.687560 4.390831 4.074064 3.197270 5.621322
## [4,] 12.27479 0.8830721 2.278991 7.654170 5.412227 3.430496 7.397266
## [5,] 12.29947 0.2566831 1.720871 6.827953 7.079582 3.454351 6.018177
## [6,] 11.96974 -0.6589906 2.050596 5.877351 4.497742 2.564088 8.042439
## icu_los_day hospital_los_day agegroup heart_failure mortality
## [1,] 5.499151 4.8730066 6.538453 2.7255033 7.613152
## [2,] 3.897263 1.5475419 6.804577 2.3191152 10.827528
## [3,] 5.478581 0.9876445 6.977166 2.5511299 7.436863
## [4,] 6.337409 3.7439084 6.493189 0.6335887 8.681894
## [5,] 4.225862 1.1968753 8.459238 1.9928457 6.464950
## [6,] 6.070956 0.1418116 8.267406 0.5236195 5.262396
## Boruta performed 99 iterations in 31.98341 secs.
## Tentatives roughfixed over the last 99 iterations.
## 10 attributes confirmed important: age, agegroup, censor_flg, chf_flg,
## heart_failure and 5 more;
## 2 attributes confirmed unimportant: gender_num, hospital_los_day;
Interpretation & Analysis: We excluded all the rejected features with infinite importance in our analysis. Then, we sorted the non-rejected or important features according to their median importance and print them using plotly by representing them as boxplots. In this whiskerplot, the variables are represented such that their median, quartiles and min and max are visible to decide which are tentative and important variables. We can see the range of importance scores within a single variable in the graph. It may be desirable to get rid of tentative features.
summary(without_conf_n$sapsi_first)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.05882 0.11765 0.16464 0.23529 1.00000
## n= 613
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 613 240 1 (0.3915171 0.6084829)
## 2) age>=0.5790757 347 153 0 (0.5590778 0.4409222)
## 4) age>=0.8433225 66 16 0 (0.7575758 0.2424242)
## 8) icu_los_day>=0.01825741 59 11 0 (0.8135593 0.1864407) *
## 9) icu_los_day< 0.01825741 7 2 1 (0.2857143 0.7142857) *
## 5) age< 0.8433225 281 137 0 (0.5124555 0.4875445)
## 10) age>=0.6998215 171 74 0 (0.5672515 0.4327485)
## 20) icu_los_day>=0.1878163 34 8 0 (0.7647059 0.2352941) *
## 21) icu_los_day< 0.1878163 137 66 0 (0.5182482 0.4817518)
## 42) icu_los_day< 0.06381056 70 26 0 (0.6285714 0.3714286) *
## 43) icu_los_day>=0.06381056 67 27 1 (0.4029851 0.5970149) *
## 11) age< 0.6998215 110 47 1 (0.4272727 0.5727273)
## 22) age< 0.6716549 71 35 0 (0.5070423 0.4929577)
## 44) icu_los_day< 0.1885394 55 23 0 (0.5818182 0.4181818)
## 88) age>=0.642669 22 6 0 (0.7272727 0.2727273) *
## 89) age< 0.642669 33 16 1 (0.4848485 0.5151515)
## 178) age< 0.6224703 21 8 0 (0.6190476 0.3809524) *
## 179) age>=0.6224703 12 3 1 (0.2500000 0.7500000) *
## 45) icu_los_day>=0.1885394 16 4 1 (0.2500000 0.7500000) *
## 23) age>=0.6716549 39 11 1 (0.2820513 0.7179487) *
## 3) age< 0.5790757 266 46 1 (0.1729323 0.8270677) *
Interpretation & Analysis: We splitted the dataset such that 80% is training data and 20% is test data. Then we used rpart to construct the classification tree. The above plot shows the important features used by the algorithm for classifying observations. The variables Age and icu_los_day emerge as the most important variables for carrying out recursive partitioning. It can be seen that for age greater than 58, most of the outcomes have “0” as the end result indicating a higher number for deaths of patients aged greated than or equal to 58 in the ICU.
## classifier_knn
## 0 0.0588235294117647 0.117647058823529 0.176470588235294
## 0 17 8 3 0
## 0.0588235294117647 17 8 5 3
## 0.117647058823529 7 9 2 2
## 0.176470588235294 0 3 6 3
## 0.235294117647059 0 5 1 6
## 0.294117647058824 0 0 0 1
## 0.352941176470588 0 0 0 1
## 0.411764705882353 0 0 0 1
## 0.470588235294118 0 0 0 0
## 0.529411764705882 0 0 0 0
## 0.588235294117647 0 0 0 0
## 0.647058823529412 0 0 0 0
## 0.705882352941177 0 0 0 0
## 0.882352941176471 0 0 0 0
## classifier_knn
## 0.235294117647059 0.294117647058824 0.352941176470588
## 0 1 0 0
## 0.0588235294117647 1 0 0
## 0.117647058823529 3 2 0
## 0.176470588235294 1 3 0
## 0.235294117647059 2 3 1
## 0.294117647058824 2 2 2
## 0.352941176470588 3 0 0
## 0.411764705882353 2 1 1
## 0.470588235294118 1 2 2
## 0.529411764705882 0 1 1
## 0.588235294117647 0 0 1
## 0.647058823529412 1 0 0
## 0.705882352941177 0 0 1
## 0.882352941176471 0 0 0
## classifier_knn
## 0.411764705882353 0.470588235294118 0.529411764705882
## 0 0 0 0
## 0.0588235294117647 0 0 0
## 0.117647058823529 0 0 0
## 0.176470588235294 0 0 0
## 0.235294117647059 0 0 0
## 0.294117647058824 0 0 0
## 0.352941176470588 0 0 0
## 0.411764705882353 0 0 0
## 0.470588235294118 0 0 0
## 0.529411764705882 1 0 0
## 0.588235294117647 0 0 0
## 0.647058823529412 0 0 0
## 0.705882352941177 1 0 0
## 0.882352941176471 0 1 0
## classifier_knn
## 0.588235294117647 0.647058823529412 0.705882352941177
## 0 0 0 0
## 0.0588235294117647 0 0 0
## 0.117647058823529 0 0 0
## 0.176470588235294 0 0 0
## 0.235294117647059 0 0 0
## 0.294117647058824 0 0 0
## 0.352941176470588 0 0 0
## 0.411764705882353 0 0 0
## 0.470588235294118 0 0 0
## 0.529411764705882 0 0 0
## 0.588235294117647 0 0 0
## 0.647058823529412 0 0 0
## 0.705882352941177 0 0 0
## 0.882352941176471 0 0 0
## classifier_knn
## 0.823529411764706 1
## 0 0 0
## 0.0588235294117647 0 0
## 0.117647058823529 0 0
## 0.176470588235294 0 0
## 0.235294117647059 0 0
## 0.294117647058824 0 0
## 0.352941176470588 0 0
## 0.411764705882353 0 0
## 0.470588235294118 0 0
## 0.529411764705882 0 0
## 0.588235294117647 0 0
## 0.647058823529412 0 0
## 0.705882352941177 0 0
## 0.882352941176471 0 0
## [1] "Accuracy = 0.225165562913907"
Interpretation & Analysis: We tried various values of k to train the model, and the highest accuracy we could obtain was 31.7% for k=2 which is very less accuracy. Thus, we tried to train the dataset using the neural network algorithm.
## [,1]
## [1,] 0.5086503
## Setting default kernel parameters
## Support Vector Machine object of class "ksvm"
##
## SV type: eps-svr (regression)
## parameter : epsilon = 0.1 cost C = 1
##
## Linear (vanilla) kernel function.
##
## Number of Support Vectors : 507
##
## Objective Function Value : -422.336
## Training error : 1.213125
## agreement
## FALSE
## 1
Interpretation & Analysis: Unfortunately, the model performance is actually worse than the previous one. SVM did not perform very well maybe because the dataset has more noise i.e. target classes are overlapping. Another possibility can be because of the dataset having linear features.
## age gender_num sapsi_first chf_flg
## 0.23401548 0.49976343 0.15878730 0.37410439
## censor_flg renal_flg wbc_first hgb_first
## 0.49167711 0.22400212 0.06767688 0.13801630
## icu_los_day hospital_los_day
## 0.13859862 0.08316255
##
## Call:
## glm(formula = censor_flg ~ age + chf_flg + sapsi_first + icu_los_day,
## family = "binomial", data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4585 -1.0067 0.4100 0.9296 1.8380
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.7184 0.3748 9.922 <2e-16 ***
## age -4.6337 0.5304 -8.736 <2e-16 ***
## chf_flg -0.1168 0.2437 -0.479 0.6317
## sapsi_first -1.6064 0.5870 -2.737 0.0062 **
## icu_los_day -0.9380 0.6642 -1.412 0.1579
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 820.71 on 612 degrees of freedom
## Residual deviance: 684.09 on 608 degrees of freedom
## AIC: 694.09
##
## Number of Fisher Scoring iterations: 4
#calculate probability of default for each individual in test dataset
predicted <- predict(mylogit, testing, type="response")
#calculate AUC
library(pROC)
auc(testing$censor_flg, predicted)## Area under the curve: 0.7284
Interpretation & Analysis: Higher area under the curve (AUC) indicates better performance of the model and its ability to distinguish between the positive and negative classes. As the AUC is 0.72, the predictions of this model are moderately accurate and can be improved upon. Also, Akaike information criterion (AIC) is 875.18 and thus the smaller the AIC value, the better the model fit.
We trained the dataset using four models - KNN, SVM, Neural Network and linear regression. The best fit model was linear regression as the area under the curve was 0.72 indicating a rate of 72% accurate predictions of the mortality considering the age, SAPS score at the time of ICU admission, and whether or not the patient has congestive heart failure (chf_flg=1 or chf_flg=0).
Associations and correlations should have scientific validity. For future analysis of this question, we can investigate with the model by collecting more data and strategizing on addressing the selection of features that are representative of the sample so that it could be significant. After internal validation of the model, it is best practice to pilot it in other geographic areas for external validations and address any discrepancies before rolling out into the real world. Regulations should be disclosed that the model should not be misused by for-profit agencies when it comes to adjusting insurance premiums based on the health conditions, which could lead to disparities.
[2] https://sph.unc.edu/wp-content/uploads/sites/112/2015/07/nciph_ERIC11.pdf
[3] https://en.wikipedia.org/wiki/Bradford_Hill_criteria
[4] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589117
[6] https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
[7] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3760015/
[8] https://link.springer.com/article/10.1007/s00134-005-2763-5
[9] https://www.nature.com/articles/s41598-021-03397-3.pdf?proof=t+target%3D
[10] https://www.frontiersin.org/articles/10.3389/fcvm.2021.774935/full
[11] Schoe A, Bakhshi-Raiez F, de Keizer N, van Dissel JT, de Jonge E. Mortality prediction by SOFA score in ICU-patients after cardiac surgery; comparison with traditional prognosticmodels. BMC Anesthesiol. (2020) 20:65. doi: 10.1186/s12871-020-00975-2
[12] P. E. Marik, “Management of the critically ill geriatric patient,” Critical Care Medicine, vol. 34, no. 9, pp. S176–S182, 2006
[13] Tang, Y. D., & Katz, S. D. (2006)]. Anemia in chronic heart failure: prevalence, etiology, clinical correlates, and treatment options. Circulation, 113(20), 2454-2461
[14] Final Stages of Heart Failure: End-Stage Heart Failure. (2020, January 14). Samaritan. https://samaritannj.org/hospice-blog-and-events/hospice-palliative-care-blog/end-stage-heart-failure-what-to-expect/
[15] Aftab Haq, Sachin Patil, Alexis Lanteri Parcells, Ronald S. Chamberlain, “The Simplified Acute Physiology Score III Is Superior to the Simplified Acute Physiology Score II and Acute Physiology and Chronic Health Evaluation II in Predicting Surgical and ICU Mortality in the “Oldest Old””, Current Gerontology and Geriatrics Research, vol. 2014, Article ID 934852, 9 pages, 2014. https://doi.org/10.1155/2014/934852.